A Language-Independent Feature Schema for Inflectional Morphology
نویسندگان
چکیده
This paper presents a universal morphological feature schema that represents the finest distinctions in meaning that are expressed by overt, affixal inflectional morphology across languages. This schema is used to universalize data extracted from Wiktionary via a robust multidimensional table parsing algorithm and feature mapping algorithms, yielding 883,965 instantiated paradigms in 352 languages. These data are shown to be effective for training morphological analyzers, yielding significant accuracy gains when applied to Durrett and DeNero’s (2013) paradigm learning framework.
منابع مشابه
A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging
Semantically detailed and typologically-informed morphological analysis that is broadly applicable cross-linguistically has the potential to improve many NLP applications, including machine translation, n-gram language models, information extraction, and co-reference resolution. In this paper, we present a universal morphological feature schema, which is a set of features that represent the fin...
متن کاملRepresenting Lexical Knowledge for Bulgarian Inflectional Morphology in DATR
The paper analyses the application of DATR language for lexical knowledge presentation for interpreting Bulgarian inflectional morphology. It discuss the semantic network of the feature of definiteness in Bulgarian language and compares the lexical knowledge representation for the different part-of-speech with respect to the defined grammar rules, the sound alternations, the related formal pres...
متن کاملRecognition and Generation of word form for natural language understanding systems: Integrating two-level morphology and feature unification
A language-independent morphological component for the recognition and generation of word forms is presented. Based on a lexicon of morphs, the approach combines two-level morphology and a feature-based unification grammar describing word formation. To overcome the heavy use of diacritics, feature structures are associated with the two-level rules. These feature structures function as filters f...
متن کاملDiscriminative n-gram language modeling for Turkish
In this paper Discriminative Language Models (DLMs) are applied to the Turkish Broadcast News transcription task. Turkish presents a challenge to Automatic Speech Recognition (ASR) systems due to its rich morphology. Therefore, in addition to word n-gram features, morphology based features like root n-grams and inflectional group n-grams are incorporated into DLMs in order to improve the langua...
متن کاملTowards Unsupervised and Language-independent Compound Splitting using Inflectional Morphological Transformations
In this paper, we address the task of languageindependent, knowledge-lean and unsupervised compound splitting, which is an essential component for many natural language processing tasks such as machine translation. Previous methods on statistical compound splitting either include language-specific knowledge (e.g., linking elements) or rely on parallel data, which results in limited applicabilit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015